{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "X80i_girFR2o"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "bB8gHCR3FVC0"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Z17OmgavQfp4"
      },
      "source": [
        "# Recommending movies: retrieval with distribution strategy\n",
        "\n",
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/recommenders/examples/diststrat_retrieval\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/diststrat_retrieval.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/recommenders/blob/main/docs/examples/diststrat_retrieval.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/diststrat_retrieval.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kCeYA79m1DEX"
      },
      "source": [
        "In this tutorial, we're going to train the same retrieval model as we did in the [basic retrieval](basic_retrieval) tutorial, but with distribution strategy.\n",
        "\n",
        "We're going to:\n",
        "\n",
        "1. Get our data and split it into a training and test set.\n",
        "2. Set up two virtual GPUs and TensorFlow MirroredStrategy.\n",
        "3. Implement a retrieval model using MirroredStrategy.\n",
        "4. Fit it with MirrorredStrategy and evaluate it.\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sawo1x8kQk9b"
      },
      "source": [
        "## Imports\n",
        "\n",
        "\n",
        "Let's first get our imports out of the way."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0vJOdh9WbTpd"
      },
      "outputs": [],
      "source": [
        "!pip install -q tensorflow-recommenders\n",
        "!pip install -q --upgrade tensorflow-datasets"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SZGYDaF-m5wZ"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import pprint\n",
        "import tempfile\n",
        "\n",
        "from typing import Dict, Text\n",
        "\n",
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "import tensorflow_datasets as tfds"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "BxQ_hy7xPH3N"
      },
      "outputs": [],
      "source": [
        "import tensorflow_recommenders as tfrs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5PAqjR4a1RR4"
      },
      "source": [
        "## Preparing the dataset\n",
        "\n",
        "We prepare the dataset in exactly the same way as we do in the [basic retrieval](basic_retrieval) tutorial."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "aaQhqcLGP0jL"
      },
      "outputs": [],
      "source": [
        "# Ratings data.\n",
        "ratings = tfds.load(\"movielens/100k-ratings\", split=\"train\")\n",
        "# Features of all the available movies.\n",
        "movies = tfds.load(\"movielens/100k-movies\", split=\"train\")\n",
        "\n",
        "for x in ratings.take(1).as_numpy_iterator():\n",
        "  pprint.pprint(x)\n",
        "\n",
        "for x in movies.take(1).as_numpy_iterator():\n",
        "  pprint.pprint(x)\n",
        "\n",
        "ratings = ratings.map(lambda x: {\n",
        "    \"movie_title\": x[\"movie_title\"],\n",
        "    \"user_id\": x[\"user_id\"],\n",
        "})\n",
        "movies = movies.map(lambda x: x[\"movie_title\"])\n",
        "\n",
        "tf.random.set_seed(42)\n",
        "shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)\n",
        "\n",
        "train = shuffled.take(80_000)\n",
        "test = shuffled.skip(80_000).take(20_000)\n",
        "\n",
        "movie_titles = movies.batch(1_000)\n",
        "user_ids = ratings.batch(1_000_000).map(lambda x: x[\"user_id\"])\n",
        "\n",
        "unique_movie_titles = np.unique(np.concatenate(list(movie_titles)))\n",
        "unique_user_ids = np.unique(np.concatenate(list(user_ids)))\n",
        "\n",
        "unique_movie_titles[:10]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8Qk-r3dNvreA"
      },
      "source": [
        "## Set up two virtual GPUs\n",
        "\n",
        "If you have not added GPU accelerators to your Colab, please disconnect the Colab runtime and do it now. We need the GPU to run the code below:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "09AZOFL8wNVD"
      },
      "outputs": [],
      "source": [
        "gpus = tf.config.list_physical_devices(\"GPU\")\n",
        "if gpus:\n",
        "  # Create 2 virtual GPUs with 1GB memory each\n",
        "  try:\n",
        "    tf.config.set_logical_device_configuration(\n",
        "        gpus[0],\n",
        "        [tf.config.LogicalDeviceConfiguration(memory_limit=1024),\n",
        "         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])\n",
        "    logical_gpus = tf.config.list_logical_devices(\"GPU\")\n",
        "    print(len(gpus), \"Physical GPU,\", len(logical_gpus), \"Logical GPUs\")\n",
        "  except RuntimeError as e:\n",
        "    # Virtual devices must be set before GPUs have been initialized\n",
        "    print(e)\n",
        "\n",
        "strategy = tf.distribute.MirroredStrategy()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eCi-seR86qqa"
      },
      "source": [
        "## Implementing a model\n",
        "\n",
        "We implement the user_model, movie_model, metrics and task in the same way as we do in the [basic retrieval](basic_retrieval) tutorial, but we wrap them in the distribution strategy scope:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kHQZJEhXP93N"
      },
      "outputs": [],
      "source": [
        "embedding_dimension = 32\n",
        "\n",
        "with strategy.scope():\n",
        "  user_model = tf.keras.Sequential([\n",
        "    tf.keras.layers.StringLookup(\n",
        "        vocabulary=unique_user_ids, mask_token=None),\n",
        "    # We add an additional embedding to account for unknown tokens.\n",
        "    tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)\n",
        "  ])\n",
        "\n",
        "  movie_model = tf.keras.Sequential([\n",
        "    tf.keras.layers.StringLookup(\n",
        "        vocabulary=unique_movie_titles, mask_token=None),\n",
        "    tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)\n",
        "  ])\n",
        "\n",
        "  metrics = tfrs.metrics.FactorizedTopK(\n",
        "    candidates=movies.batch(128).map(movie_model)\n",
        "  )\n",
        "\n",
        "  task = tfrs.tasks.Retrieval(\n",
        "    metrics=metrics\n",
        "  )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FZUFeSlWRHGx"
      },
      "source": [
        "We can now put it all together into a model. This is exactly the same as in the [basic retrieval](basic_retrieval) tutorial."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8n7c5CHFp0ow"
      },
      "outputs": [],
      "source": [
        "class MovielensModel(tfrs.Model):\n",
        "\n",
        "  def __init__(self, user_model, movie_model):\n",
        "    super().__init__()\n",
        "    self.movie_model: tf.keras.Model = movie_model\n",
        "    self.user_model: tf.keras.Model = user_model\n",
        "    self.task: tf.keras.layers.Layer = task\n",
        "\n",
        "  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -\u003e tf.Tensor:\n",
        "    # We pick out the user features and pass them into the user model.\n",
        "    user_embeddings = self.user_model(features[\"user_id\"])\n",
        "    # And pick out the movie features and pass them into the movie model,\n",
        "    # getting embeddings back.\n",
        "    positive_movie_embeddings = self.movie_model(features[\"movie_title\"])\n",
        "\n",
        "    # The task computes the loss and the metrics.\n",
        "    return self.task(user_embeddings, positive_movie_embeddings)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yDN_LJGlnRGo"
      },
      "source": [
        "## Fitting and evaluating\n",
        "\n",
        "Now we instantiate and compile the model within the distribution strategy scope.\n",
        "\n",
        "Note that we are using Adam optimizer here instead of Adagrad as in the [basic retrieval](basic_ranking) tutorial since Adagrad is not supported here."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "aW63YaqP2wCf"
      },
      "outputs": [],
      "source": [
        "with strategy.scope():\n",
        "  model = MovielensModel(user_model, movie_model)\n",
        "  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.1))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Nma0vc2XdN5g"
      },
      "source": [
        "Then shuffle, batch, and cache the training and evaluation data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "53QJwY1gUnfv"
      },
      "outputs": [],
      "source": [
        "cached_train = train.shuffle(100_000).batch(8192).cache()\n",
        "cached_test = test.batch(4096).cache()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "u8mHTxKAdTJO"
      },
      "source": [
        "Then train the  model:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ZxPntlT8EFOZ"
      },
      "outputs": [],
      "source": [
        "model.fit(cached_train, epochs=3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YsluR8audV9W"
      },
      "source": [
        "You can see from the training log that TFRS is making use of both virtual GPUs.\n",
        "\n",
        "Finally, we can evaluate our model on the test set:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "W-zu6HLODNeI"
      },
      "outputs": [],
      "source": [
        "model.evaluate(cached_test, return_dict=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JKZyP9A1dxit"
      },
      "source": [
        "This concludes the retrieval with distribution strategy tutorial."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "diststrat_retrieval.ipynb",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
